Skip to content

Conversation

@jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Nov 27, 2025

✨ Description

  • Add HuggingfaceMultiModalModelForCausalLM wrapping multimodal models for hugging face, following the llava format.
  • Integrate the content of HuggingfaceBaseModelForCausalLM into HuggingfacePreTrainedModel and generalize to arbitrary inputs.
  • Rework output_hidden_states into an extensive debugging utility using the existing DebugLayer. When calling the model, one may "request" the model to output specific hidden states by providing a list of names in kwargs["output_hidden_states"] (output_hidden_states in hf wrapper). The matching hidden states (using regex) will be returned in kwargs["hidden_states"]. This is still experimental but already helped a lot with degugging. Ex:
    >>> model_fast_llm(test_input, pixel_values=pixels,output_hidden_states=["vision_encoder.encoder.0.mixer", "head.logits"])
    CausalLMOutputWithPast(loss=None, logits=tensor(...), past_key_values=[], hidden_states=
    {'vision_encoder.encoder.0.mixer.query_rotary_input': tensor(...),  'vision_encoder.encoder.0.mixer.key_rotary_input': tensor(...), 
    'vision_encoder.encoder.0.mixer.query': tensor(...), 'vision_encoder.encoder.0.mixer.key': tensor(...), 
    'vision_encoder.encoder.0.mixer.value': tensor(...),  'vision_encoder.encoder.0.mixer.context': tensor(...), 
    'vision_encoder.encoder.0.mixer': tensor(...), 'head.logits': tensor(...)}, attentions=None)
    
  • Replace the patch "convolution" by a simpler linear layer.
  • Add support for linear layers without input gradients (ex. vision embeddings)
  • Fix patch ordering in get_patches_from_images
  • Add missing causal and cross_document_attention in llava conversion.

@jlamypoirier jlamypoirier marked this pull request as ready for review November 28, 2025 01:46
Copy link
Collaborator

@tscholak tscholak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tscholak tscholak merged commit 85cdd69 into main Dec 1, 2025
4 checks passed
@tscholak tscholak deleted the jlp/vision_huggingface branch December 1, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants